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Abstract 

An algorithm for exact maximum likelihood(ML) decoding on tail-biting trellises is presented, which exhibits 
very good average case behavior. An approximate variant is proposed, whose simulated performance is observed 
to be virtually indistinguishable from the exact one at all values of signal to noise ratio, and which effectively 
performs computations equivalent to at most two rounds on the tail-biting trellis. The approximate algorithm is 
analyzed, and the conditions under which its output is different from the ML output are deduced. The results of 
simulations on an AWGN channel for the exact and approximate algorithms on the 16 state tail-biting trellis for 
the (24,12) Extended Golay Code, and tail-biting trellises for two rate 1/2 convolutional codes with memories 
of 4 and 6 respectively, are reported. An advantage of our algorithms is that they do not suffer from the effects 
of limit cycles or the presence of pseudocodewords. 

I. Introduction 

Tail-biting trellises are perhaps the simplest instances of decoding graphs with cycles. A tail-biting trellis 
has a Tanner graph [31] with a single cycle and usually approximate algorithms are used for decoding, as 
exact algorithms are believed to be too expensive. These approximate algorithms iterate around the trellis 
until either convergence is reached, or for a preset number of cycles. To the best of our knowledge, no exact 
decoding algorithms other than the brute force algorithm have been proposed so far for the general case, though 
there are several approximate algorithms for maximum-likelihood decoding [28], [22], [34], [33], [7], [20] and 
exact algorithms for bounded distance decoding [4]. The problem of Maximum A-Posteriori Probability (MAP) 
decoding is not addressed here. We propose an exact recursive algorithm, which exhibits very good average case 
behavior. The algorithm exploits the fact that a linear tail-biting trellis can be viewed as a coset decomposition 
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of the group corresponding to the linear code with respect to a specific subgroup and is based on the A* 
algorithm [23]. We also propose two approximate variants that always converge, and observe their performance 
on tail-biting trellises for the (24,12) extended Golay code and two convolutional codes of rate 1/2 and memory 
of 4 and 6 respectively. The performance of the first approximate variant is indistinguishable from that of the 
exact algorithm in terms of bit error rate for the two convolutional codes, and it is guaranteed to update each 
node in the tail-biting trellis at most twice i.e it performs a computation equivalent to at most two rounds on the 
trellis. Section UTI briefly mentions related work. Section ITTTl provides some background. Section ITvl describes 
the algorithm, while Section [V] analyses the algorithm. Section I VII describes the approximate algorithm and 
provides an analysis for its good performance. Section IVHI reports the results of simulations on an AWGN 
channel and section IVIlTl concludes the paper. 

II. Related Work 

Aji et al. [3] have shown that iterative maximum-likelihood (ML) decoding on tail-biting trellises will 
asymptotically converge to exact maximum likelihood decoding for certain codes. They provide experimental 
evidence that practically ML decoding is achieved for the (8, 4) Hamming code with five rounds of the tail -biting 
trellis. The presence of pseudocodewords sometimes results in sub-optimal decoding and it is also possible to 
have situations where the iterative message passing algorithm does not converge. Several maximum likelihood 
decoding algorithms on tail-biting trellises have been proposed without a theoretical analysis [22], [33], [34], 
[30], [28], [20], but with good experimental results. Most of these are sub-optimal algorithms in that they 
may not produce the exact maximum-likelihood result on termination. Anderson and Hladik [4] have given an 
algorithm that is optimal for bounded distance decoding. The A* algorithm [23] has been used for maximum 
likelihood soft decision decoding on conventional trellises for block codes [10], [9], [19], [11], [12], [26]. In 
[10] Han et al. propose the use of the A* algorithm for ML decoding of block codes on their conventional 
trellises and report significant experimental gains in decoding complexity for signal to noise ratios ranging 
from 5 dB to 8 dB. This algorithm has been analyzed in [14] and shown to be efficient for many practical 
communication systems. In [11] a modified algorithm is proposed which searches through error patterns instead 
of codewords and similar gains are reported. Heuristic search algorithms are proposed in [26] which combine 
previously proposed algorithms and are able to outperform other practical decoders. A tutorial paper on the 
application of the A* algorithm to soft decision decoding appears in [9]. Sorokine and Kschischang [19] propose 
a metric called the variable bias term that is used in an A* algorithm, which has low computational complexity. 
Aguado and Farrell [1] discuss modified sequential algorithms on conventional trellises for block codes, which 
offer reduced complexity in comparison with the original stack algorithm [15] for sequential decoding. Han 
et al. [13] propose a trellis based ML soft-decision decoder for convolutional codes which uses a stack and a 
metric that ensures ML decoding. 

III. Background 

We first present some background on tail-biting trellises. Tail-biting trellises for convolutional codes were 
introduced in [30]. Minimal tail-biting trellises for block codes have been discussed in [6], [16], [17]. 



Definition 3.1: A tail-biting trellis T = (V, E, ¥ q ) of depth n is an edge-labeled directed graph with the 
property that the set V can be partitioned into n vertex classes 

V = V U V 1 U . . . U K-i (1) 

such that every edge in T is labeled with a symbol from the alphabet ¥ q , and begins at a vertex of Vi and 
ends at a vertex of V i+ i( mo( i „), for some i £ {0, 1, . . . , n — 1}. 

We identify 1 the set of time indices with Z„, the residue classes of integers modulo n. An interval of indices 
represents the sequence {i, i + 1, . . . j} if i < j, and the sequence {i, i + 1, . . . n — 1, 0, . . . j} if i > j. 
Every cycle in T starting at a vertex of Vq defines a vector (a\,a 2 , ■ ■ ■ ,a n ) £ F™ which is an edge-label 
sequence. We assume that every vertex and every edge in the tail-biting trellis lies on some cycle, that is the 
tail-biting trellises we are dealing with are reduced [17]. The trellis T represents a block code C over ¥ q if the 
set of all edge-label sequences in T is equal to C. Let C(T) denote the code represented by a trellis T. 

A linear tail-biting trellis, for an (n, k) linear block code C over F g can be constructed as a trellis product [18] 
of the representation of the individual trellises (called elementary trellises) corresponding to each of the k 
rows of the generator matrix G for C [17]. Let T\ and T 2 be the component trellises. The set of vertices 
Vi(Ti x T 2 ) of the product trellis T\ x T 2 at time index i, is just the Cartesian product of the vertices of 
the component trellises. Thus Vi{T x x T 2 )= Vi{T{) x Vi(T 2 ). Consider £,(Ti) x £ , i (T 2 ), and interpret an 
element ((v\ , a\ , v[) , (V2, a-i, v' 2 )) in this product, where Vi, v[ are vertices and a\,a 2 edge labels, as the edge 
((vi,v 2 ),ai + a 2 , {v'i,v 2 )) where + denotes addition in the field ¥ q . If we define the i th section as the set 
of edges connecting the vertices at time index i to those at time index i + 1, then the edge count in the i th 
section is the product of the edge counts in the i th section of the individual trellises. 

Let {gi,g2, • • • ,gfe} be the rows of a generator matrix G for the linear code C. Each vector gi generates 
a one-dimensional subcode of C, which we denote by (g,). Therefore C — (gi) + (g 2 ) + • • • + (gfc), and the 
trellis representing C is given by T — T\ x T 2 x • • • x T k , where is the trellis for (gi), 1 < i < k. To 
specify the component trellises in the trellis product above, we will need to introduce the notions of linear[18] 
and circular spans [17] and elementary trellises [18], [17]. Given a codeword c = (ci, c 2 , . . . c„) € C, the linear 
span of c, is the smallest interval £ I = {1,2, .. .n},i < j which contains all the non-zero positions of 
c. A circular span has exactly the same definition with i > j. Note that for a given vector, the linear span is 
unique, but circular spans are not- they depend on the runs of consecutive zeros chosen for the complement of 
the span with respect to the index set /. For a vector x = (x\, . . . , x n ) over the field ¥ q and a specified span 
there is a unique linear elementary trellis representing (x) [17]. This trellis has q vertices at time indices 
i to (j — 1) mod n, and a single vertex at other positions. Consequently, Tj in the trellis product mentioned 
earlier, is the elementary trellis representing (gi) for some choice of span (either linear or circular). Koetter and 
Vardy [17] have shown that any linear trellis, conventional or tail -biting can be constructed from a generator 
matrix whose rows can be partitioned into two sets, those which have linear span, and those taken to have 

circular span. The trellis for the code is formed as a product of the elementary trellises corresponding to these 

Gi 

where G/ is the submatrix consisting of 



rows. We will represent such a generator matrix as Gkv = 

G, 

rows with linear span, and G c the submatrix of rows with circular span 



Definition 3.2: For a vector v of circular span in G c , the interval [j mod n, (i — 1) mod n] is called 
the zero run of the vector. 

The path in the trellis corresponding to this vector shares all states at time indices in the zero run with the 
path corresponding to the all-zero codeword in the product trellis. 

For example,consider the codeword 0100011 with circular span [6, 2]. This has zero run [2, 5]. The elementary 
trellis corresponding to this vector has state cardinality profile (2, 2, 1, 1, 1, 1, 2). (Recall the time indices are 
numbered from to n — 1 where n is the length of the code). 

As an example we display a tail-biting trellis for a binary (7, 4) Hamming code. Though this is not a 
minimal trellis for the code, it serves to illustrate some of the definitions above. The spans of the rows are 
shown alongside the rows. All spans with i greater than j are circular spans. 

Example 3.1: Let C be a (7, 4) 2 Hamming code a with a product generator matrix Gkv defined as 



G 



KV 



1 1 1 
1110 1 
The product tail-biting trellis for this generator matrix is given in Figure [2 



1 1 1 
10 111 



[1,6] 
[3,7] 
[6,2] 
[7,4] 




Fig. 1. A product tail-biting trellis for the (7,4) binary Hamming code. 



Definition 3.3: A subtrellis of a tailbiting trellis consists of a start node at time index zero and all edges and 
nodes which can be traversed in any cycle of the graph that begins and ends at this start node. 

Let Ti denote the minimum conventional trellis for the code generated by Gi. Clearly Ti is a subtrellis of the 
tail-biting trellis. If I is the number of rows of G with linear span and c the number of rows of circular span, 
the tail-biting trellis constructed using the product construction will have q c start states. Each such start state 
defines a subtrellis whose codewords form a coset of the subcode corresponding to the subtrellis containing 
the all codeword. The coset structure is well known and has been reported in [29], [24], [8], [27], [30]. Each 



Fig. 2. Subtrellis T; = Ti of the tail-biting trellis for the (7,4) Hamming code in Figure [Tlfor vectors of linear span 








Fig. 3. Subtrellis T2 corresponding to coset leader 0100011 with zero-run [2,5] 




Fig. 4. Subtrellis T3 corresponding to coset leader 0111001 with zero-run [4,6] 




Fig. 5. Subtrellis T4 corresponding to coset leader 0011010 with zero-run [4,5] 



vector in the circular span can be considered to be a coset leader . The set of zero runs, of the coset leaders 
determines the structure of the tail -biting trellis in the following way. If a coset leader has zero run then 
the subtrellis associated with that coset shares all states at time indices in the interval with the subtrellis 
corresponding to the subcode defined by vectors of linear span. Further, we recall, the coset leader shares all 
states in the interval [i, j] with the states corresponding to the all-zero codeword. 



The four subtrellises of the tail -biting trellis of Figure \l\aie shown in Figures [2j [3] E^nd [5] along with their 
associated coset leaders and zero runs. 

Definition 3.4: If subtrellises T± and T2 share states from time indices i to j then the interval [i,j] is called 
the merging interval of T\ and T2. 

It is easy to see that two subtrellises do not share any states outside their merging interval. 

A tail-biting trellis is said to satisfy the intersection property if the intersection of all the zero runs of the 
members of G c is non-empty. The tail-biting trellis for the Hamming code given in Example 13. II satisfies the 
intersection property as the interval [4, 5] is contained in the intersection of all the zero runs of G c . 

IV. Decoding 

The decoding algorithm proposed here is different from the sub-optimal algorithms mentioned in Section UTI 
that go round and round the tail-biting trellis updating all the nodes of the trellis in every round. It makes one 
round of the tail-biting trellis and subsequently judiciously uses the information gathered to further update as 
few nodes as it can before it closes in on the most likely codeword. Our algorithm has two phases. In the first 
phase a Viterbi algorithm is performed on the tail-biting trellis. This phase performs computations at every 
node of the tail-biting trellis. In the second phase however, only one path is tracked at a time, this being the 
most likely path. The initial estimate of the most likely path is obtained from the first phase. This path is 
present in some subtrellis and is followed until the algorithm decides that some other path (perhaps in another 
subtrellis) looks more promising based on some metric. When such a situation is encountered, computation on 
this path is suspended and the more promising path is taken up. While this strategy at first glance looks like 
the stack algorithm [15] for decoding convolutional codes, it differs from it because it has the property that it 
always delivers the optimal path as the metric used satisfies the property required by the A* algorithm. (We 
will prove this property formally). 

For purposes of decoding we use the unrolled version of the trellis with start states sq,si . . . s; and final 
states fo,f\.-.fi where I is the number of subtrellises. An (si,/j) path is a path from start vertex s$ to 
final vertex /j, and is consequently a codeword path in trellis Ti, whereas an (si, fj) path for i 7^ j is a non 
codeword path as it starts in subtrellis Tj and ends in subtrellis Tj. For purposes of our discussion we term 
the label sequence along such a path as a semicodeword. 

Maximum-likelihood decoding for a tail-biting trellis is equivalent to finding the codeword closest to the 
received sequence measured in terms of a soft decision metric. Assume that the channel is modeled as an 
additive white noise Gaussian(AWGN) channel and that antipodal signaling is used for communication. A 
binary code digit is mapped into \/EZ and a 1 is mapped into — V^T where E s is the signal energy per bit 
entering the channel. For a discrete additive white Gaussian noise(AWGN) channel we have 

r t = x t + n t 

where r t is the received signal at time t, x t is the transmitted signal and n t is the value of a white Gaussian 
noise random variable with variance -/Vo/2 where Nq is the noise spectral density. Without loss of generality 
we can assume that E s = 1. The signal-to-noise ratio or SNR is the quantity E s /No. The decoder uses the 



received vector r to determine which codeword was transmitted. It forms an estimate x of the codeword x 
that was transmitted. A decoding error occurs if x ^ x. The maximum likelihood decoding rule is to decode 
the received sequence r to codeword x m whenever p(r/x m ) > p(r/x;) for all I ^ m, where p(r/x m ) is the 
conditional probability of r given x m . Let 5(x) be the signal vector corresponding to the codeword x.. If 
d,E(S(x. m ), r ) is the Euclidean distance between S(x m ) and r, then the maximum likelihood decoding rule for 
decoding binary linear block codes transmitted over the AWGN channel using antipodal signaling is to decode 
r into codeword x TO whenever d B (5(x m ), r) < cIe(S(x.i), r) for all I ^ m. 

The decoding algorithm is thus cast as a shortest path problem in which each path is associated with a 
metric, and the problem is to find a codeword path with minimum metric. The A* algorithm is used to cut 
down the search space. It does so by using a node metric which is the sum of the length of the shortest path 
from the source to a node and an underestimate of the length of the shortest path from the node to the goal 
node to guide the search. As mentioned earlier, only one path is explored at a time and the algorithm derives 
it's advantage from the fact that if the estimates used are close to the actual values then the search space that 
yields the optimal path is greatly reduced. We give the algorithm below. The algorithm maintains two sets of 
vertices, S and S. The set S is the set of closed nodes and represents nodes to which the shortest paths have 
been finalized. At any iteration, the set S is the set of candidate nodes the best of which will be closed in the 
succeeding iteration. These are called the open or visited nodes. An operation of expanding a node consists of 
the following three steps: 

1. Getting all the immediate successors of the node. 

2. Checking for each immediate successor if this successor has been visited before. 

3. If the successor has been visited then updating the minimum cost path to the successor by taking the 
minimum of the cost of the previous path and the cost of this one. All the expanded nodes are put into the 
closed set and the visited nodes are put into the open set. When the goal node is reached an optimal path has 
been found. 

The following is a formal description of the algorithm. Line 1 performs the initialization of the sets and the 
costs and paths. Line 3 selects the vertex to be expanded. Line 4 puts the selected vertex into the closed set 
and deletes it from the open set. Line 5 detects if the algorithm has completed; lines 6 through 9 perform an 
expansion of a node. They update the cost of an immediate successor as well as the best path to that successor 
and mark the successor as visited by putting it into the open set. 

Algorithm A* 

Input : A trellis T = (V, E, I) where V is the set of vertices, E is the set of edges and l{u, v) > for edge 
(u, v) in E, a source vertex s and a destination vertex /, and an estimate e(u, /) for the shortest path from u 
to / for each vertex u € V. 
Output : The shortest path from s to /. 

/* cost(u) is the cost of the current shortest path from s to u and P{u) is a current shortest path from s to u 

*/ 

begin 



1. S<-0, S^{s}, cost(s)<~0, P(u) <- (),Vu e V, cost(u) = +00, Vu 7^ s; 

2. re/?eaf 

3. Let u be the vertex in S with minimum value of cost{u) + e(u, /). 
4.S^SU{u}; S^S\{u}; 

5. if u = f then return P(f); 

6. for each (u, v) G E do 

7. if v £ S then 
8. begin 

9. cost(v) <— min(cosi(u) + v) , previous(cost(v))); 

10. i/ cosi(v) ^ previous(cost(v)) then append (u, v) to P(u) to give P(v); 

11. (5) - (S) U {v}; 
12. e«<i 

13. forever 

end 

The A* algorithm is guaranteed to output the shortest path if the following two conditions hold: Let Lt{u, /) 
be the shortest path length from u to / in T. Let e(u, f) be any lower bound such that e(u, /) < Lt(u, f), 
and such that e(u, f) satisfies the following inequality, i.e, for u a predecessor of v, l(u, v) + e(v, f) > e(u, /). 
If both the above conditions are satisfied, then the algorithm A*, on termination, is guaranteed to output a 
shortest path from s to /. 

The algorithm proposed here is a variant of the A* algorithm, which at any given instant, is executing an A* 
algorithm on exactly one of the subtrellises, with perhaps suspended executions of the algorithm on a set of 
other subtrellises. The subtrellis on which the algorithm is curently executing, appears the best in its potential to 
deliver the minimal cost path. Since the algorithm is not straightforward, we first give an informal explanation 
of how it works. The algorithm has two phases. The first phase performs a Viterbi algorithm on the tail- 
biting trellis and examines surviving paths, called survivors here, at all states of the tail-biting trellis. The first 
phase is described below. Let *e denote the initial vertex of edge e. Let e* denote the vertex entered via edge e. 

Algorithm First Phase 

Input: An unrolled tail-biting trellis with start nodes si, S2, ■ ■ ■ si, final nodes /1, /2, • • • /; for the I subtrel- 
lises, and an edge cost c(e) associated with each edge e of the tail-biting trellis. 
Output: The cost cost(v) of a least cost path to each node v from any start node, 
begin 

for each node v in the tail-biting trellis initialize cost(v) = ; 
for i = 1 to n do 

for each vertex v at time index i do 

cost(v) = min e:e *= v {cost(*e) + c(e)} 
for j = 1 to I do 



metric(Tj) = cost(fj); 

end 

At the end of the first phase therefore we have a set of survivors at final nodes f\ , / 2 , . . . fi some of 
which may not correspond to codewords. The costs of these paths are taken as initial estimates for the second 
phase. We first informally describe the second phase below and then describe a recursive version in more detail. 

Algorithm Second Phase 

Input: The initial metrics metric(Ti),i = 1 . . . I computed in the first phase for the I subtrellises and the costs 
cost(v) of the survivors at all vertices v of the tail-biting trellis. 
Output: The maximum likelihood path. 

1. Sort the metrics metric(Ti) , i — 1 . . . I in increasing order; if the lowest metric is that of a codeword 
path then output that path as the ML path and return, else go to next step. 

2. low = cost of lowest codeword survivor if there is one, otherwise, otherwise low = oo. 

3. If any of the metrics metric(Tj) is greater than low then discard subtrellis Tj from the set of participants 
in the second phase. 

4. Residual-trellises = set of all non-discarded trellises with non-codeword survivors; 

5. Create a set S of the initial vertices along with metrics, of all residual trellises, and let the start node 
s of the A* algorithm be the start node of the residual trellis with a minimum initial metric; 

6. Execute lines 2 to 11 of Algorithm A* modifying statement in line 11 as if cost(v) < low then 
(S) <— (S) U {v} and statement u = f in line 5 by u e {/i, fi, ••■/;} 

7. If the open set (S) becomes empty before a final node is reached, then the codeword with cost low is 
output as the decoder's estimate of the transmitted codeword. 

The algorithm above is therefore different from the standard A* algorithm in the following ways: 

1) It may switch from one subtrellis to another depending on which subtrellis the node with minimum metric 
is located in. 

2) Each shared node in a subtrellis is regarded as a distinct node for purposes of the algorithm. Thus there 
will be as many distinct copies of a given node of the tail-biting trellis as there are residual subtrellises 
sharing that node. 

3) Before adding an element to the open set, we check to see that its metric is less than that of the best 
codeword survivor stored in low. In the traditional algorithm there is no such check. 

4) If the open set S becomes empty before a final node is reached then the codeword with cost low is 
output. 

We need to define the estimate e(u, /) in line 3 of Algorithm A*. Recall that this has to be an underestimate 
of the path length from node u to the final node if the ML path is to be output. The estimate we use for node 
v in subtrellis Tj is the difference between the initial metric for trellis Tj computed in the first phase and the 
cost of the survivor at node v in the first phase. We will prove later that this is indeed an underestimate and 
therefore guarantees that the ML path is output on termination. We implement the open set S as a heap [2]. 
This ensures that the minimum element can be retrieved in constant time and that whenever an element is 



inserted into the heap, restructuring it in order to preserve the property of constant time access to the minimum 
element, has complexity logarithmic in the size of the heap. 

We now describe the second phase of the algorithm more formally beginning with the notation used. 

1. Variable e(sj, fi) is the estimate obtained for the shortest path from the start to the final node in subtrellis 
Ti in the first phase. 

2. Variable e(v, fi) is the estimate for the shortest path from node v to node fi in subtrellis Tj which is 
computed when an update occurs at node v. This is the difference between the initial estimate at s l in trellis 
Ti, and the cost of the survivor at node v in the first phase. 

3. Variable h is a pointer to a structure representing a node in the trellis; h. state is the state, h.trellis indicates 
which trellis that state belongs to; h.metric stores the current metric which is the sum of the length of the 
path from the start node in trellis h.trellis to h.node and the estimate of the path length from h.node to the 
final node in that trellis. 

4. Variable succ is a pointer to the successor of a node; succ. state and succ.metric have meanings that can 
be deduced from 3 above. 

5. Variable index refers to the time index and takes on values from to n — 1 where n is the length of the 
code. 

6. Variable trellisnumber is a unique number associated with a subtrellis. 

7. Function InsertHeap inserts a node into the heap; function DeleteMin extracts the node with minimum 
value of metric from the heap. 

8. Function Is Empty returns a boolean value which is true if the heap is empty and false otherwise. 

9. Variable node.cost represents the actual cost of the path from the start state of a subtrellis that ends at the 
node node. Variable node.costl represents the cost of the survivor in the first phase at that node. 

10. Variable metric is the updated metric at a successor of a node in a trellis using function Update, which 
is called when that node is closed using Expand. 

11. Variable P(state) is the sequence of nodes representing the winning path at the state state. 

12. Variable low is the cost of the lowest cost (si, fi) path in the first phase. 

13. Variable flag is used to detect whether the winning path is the one identified in the first or second phase. 
It is initialized to 0. If the heap becomes empty without reaching a final node in the second phase then the 
lowest cost (si,fi) path is output as flag remains 0. Else the path that first reaches a final node in the second 
phase is the winning path. 

function Second-Phase 

/* Begin with r residual trellises whose metrics have been sorted in increasing order, and with variable low 

which stores the metric of the best codeword survivor*/ 

begin 

/* First create a heap H with these r metrics; each element of the heap is a record containing the trellis number, 
the node, the time index, and the metric*/ 
for i = 1 to r do 

InsertHeap(H, i, startV ertex(Ti) , 0, e(sj, fi)) 



endfor 

flag = 0; 

while IsEmpty(H) = false and flag = do 
h := DeleteMin(H) 

S := S U h.node /*Add h.node to the set of closed nodes*/ 
Expand(h.trellisNo, h. state, h.timeindex, h.metric) /* Expand h.node*/ 
endwhile 

if flag = then output the codeword with metric low; return 

end 

function Expand(trellisnumber, state, index, metric) 
1. begin 

2. if index = n — 1 then flag = 1; output P(state); return 

3. else 

4. for each successor succ of sicrfe do 

5. Update(trellisnumber, state, succ. state, succ.metric, index) 

6. if succ.metric < metric then S := SU {succ. state}; 

7. Expand(trellisnumber, succ. state, index, succ.metric) 

8. else 

9. if succ.metric < low 

10. then InsertHeap(H, trellisnumber, succ. state, index, succ.metric) 

11. endif 
12. endif 

13. endfor 
14. endif 
15. end 

function Update(i, nodel,node2, metric, timeindex); 
begin 

timeindex := timeindex + 1 
newcost := nodel.cost + edgecost(nodel,node2) 
if newcost < node2.cost then 

P(node2) :— (P(nodel),node2) /* update the current shortest path to node2*/ 
node2.cost := newcost /* update the cost of the current shortest path to node 2*/ 
metric := node2.cost + e(sj, /j) — node2.costl/* update the metric at node% node2.cost\ is 
the cost of the survivor in the first phase*/ 
endif 



V. Analysis of the Decoding Algorithm 
We first prove that on termination the algorithm always outputs the optimal path 

Lemma 5.1: Each survivor at a node u has a cost which is a lower bound on the cost of the least cost path 
from Sj to u in an (sj, fj) path passing through u. 

Proof: Assume that u is an arbitrary node on an (sj , fj ) path and that path P is the survivor at u in the 
first phase. There are two cases. Either P is a path from Sj to u or P is a path from Sj to u, j ^ i. If the latter 
is the case, then the cost of P is less than the cost of the path from Sj to u; hence the cost of the survivor at 
u is a lower bound on the cost of the least cost path from Sj to u, ■ 

Lemma 5.2: The quantity e(u, fj) defined in the algorithm satisfies the following two properties : 

1) e(u,fj)<L Tj (u,fj) 

2) l(u 7 v) + e(v, fj) > e(u, fj) where (u, v) is an edge. 
Proof: 

1) e(u, fj) — cost(survivor(fj)) — cost(survivor(u)) 

Also cost(survivor(fj)) < cost(survivor(u)) + {u, fj), from which the result follows. 

2) To prove: l(u,v) + e(v,fj) > e[u,fj) 

LHS = l(u,v) + e{vjj) 

= l(u,v) + e(sj,fj) — cost(survivor(v)) 
If survivor at v is survivor at u concatenated with edge (u, v), then 

LHS = l(u,v) + e(sj 1 fj) — cost(survivor(u)) — l(u,v) 

= e (ujj) 

On the other hand if survivor at v is not a continuation of the survivor at u, 

cost(survivor(v)) < cost(survivor{u)) + l(u,v) 
cost(survivor(v)) — l(u,v) < cost(survivor(u)) 
or, e{sj,f.j) — cost{survivor{v)) + l(u,v) > e(sj,fj) — cost(survivor(u)) 
or, e(v,fj) + l{u,v) > e(u, fj) 
Therefore, l(u,v) + e(v,fj) > e(u, fj) 

■ 

Lemma I5T21 and the fact that all estimates on trellises on which execution is suspended are underestimates, 
assures us that if the final node is reached in any subtrellis then this is indeed the shortest path in the tail-biting 
trellis or in other words the ML codeword. 

We first make a few observations about the algorithm. During any point in the second phase, the algorithm is 
exploring some path in a candidate subtrellis called the current trellis even though it may do so in discontinuous 
steps. This path is called the current path in that subtrellis. The metric which it uses to decide whether to 
continue on the current path on the current trellis, say Ti, or forsake it in favour of another path either in the 



current trellis or on another candidate trellis is initially e(sj, fi). We have the following lemma specifying how 
the metric changes along the path. 

Lemma 5.3: During the second phase, if the current path updates a node v using function Update, where 
the survivor in the first phase was not in the current subtrellis then the metric becomes e(sj, fi) + A(i,v) 
where A(i,v) is the difference between the cost of the least cost path ending at v in the current trellis and the 
survivor at v during the first pass. 
Proof: We know that 

cost(si, v) — cost(sj,,u) + edgecost(u, v) (2) 

and 

e { v i fi) — e { s ii fi) ~ cost{survivor{v)) (3) 

The metric is just the sum of the two lefthand sides of the previous two equations. Thus if the survivor is the 
current path then 

cost{survivor(v)) = cost(si, u) + edgecost(u, v) (4) 

and the lemma follows. If the survivor is not the current path then the metric is increased by the difference 
between the length of the current path up to v and the survivor at v. ■ 

Definition 5.1: A critical node on a path in a subtrellis is one at which the metric for a subtrellis reaches 
its final value(i.e. the actual cost of the path). 

Lemma 5.4: During the second phase, once a critical node is closed in a subtrellis, the algorithm goes on 
to reach the final node in that subtrellis without switching trellises, and outputs an ML path. 

Proof: The critical node was closed because it had the minimum metric. The metric represents the actual 
cost of the path at a critical node. This is no greater than the metrics of all other visited nodes which are 
underestimates of the costs of all other paths. Thus once a critical node is closed, the metric does not change 
along the continuation of this winning path to the final node. Therefore line 6 of function Expand is always 
true at some successor andno trellis switching takes place. ■ 

The following properties hold for the metric. Let m,i(N) denote the metric in subtrellis i at node N: 

Lemma 5.5: Let an (s&, fi) path be the winner at /, in the first phase and let it win over an (sj, fi) path at 
node A. Then m,i(A) — nii(fi) and rrii{E) < rrii(fi) for any proper predecessor B of A. 

Proof: Since the (s^., fi) path was the overall winner at fi its length will be the metric at the start node 
of trellis Ti and by Lemma 1531 the metric on the path in Tj will rise by the appropriate amounts Aj at each 
node j where the path was overtaken by a path from some other subtrellis. When it reaches node A, which 
is a critical node, the metric will reach its final value, namely rrii(fi). Since B is a predecessor of A and the 
metric rises at A, rrii(B) < rrii(fi). ■ 

For each shortest path in a subtrellis i, the nodes where it was overtaken by paths originating at the start 
nodes of other subtrellises in the first phase, are the nodes where its metric will rise during the second phase. 
These nodes are called rising points. Thus the node at the final rising point in a subtrellis is the critical node. 

Lemma 5.6: Let subtrellises Ti and Tj share a node N and between them, let Ti be the first to close the 
node in the second phase. Then m,i(N) < rrij(N). 



Proof: Since Tj is the first to close the node it closes it either before Tj was first opened or after. If the 
former was the case, then rrii(N) < rrij(sj) < mj(N). If the latter was the case the least current metric of Tj 
is greater than the metric rrii(N) of T from which the result follows as the metric can only increase. ■ 

Lemma 5.7: For nodes A and B let (^4, B) be a path segment in the merging interval of Ti and Tj and let 
mi{A) < rrij (A). Then m t (B) < m^B). 

Proof: Since at A, nii (A) < rrij (A) and thereafter all updates to the metrics in trellises Ti and Tj until 
node B is reached will be identical as the survivors at those node in the first phase will be the same for both 
trellises Tj and Tj, rrii(B) < nij(B). ■ 

We next show that any path from an arbitrary start node to any final node represents a vector in a vector 
space. For the sake of simplicity we restrict our arguments to binary codes. 

Lemma 5.8: The set of all labels from an arbitrary start node to any final node is a vector space. 

Proof: Assume that each of the c vectors in the submatrix G c of the generator matrix is of the form 
Vi = [hj,0,tj] where Vi has circular span [j,k], where hj stands for the sequence of symbols from the first, 
up to and including the k th symbol and is called the head, and tj stands for the sequence of symbols from 
positions j to n — 1 and is called the tail; represents the run of zero symbols in between the head and 

the tail, spanning the appropriate number of codeword indices. (This run may be empty if j = k + 1.) Let 

Gi 



{vi, t>2 ■ ■ • v c } be the vectors of G c . Then the matrix G s defined as G s 



G ,, 



, where G' consists of 2c 



rows of the form [hj, 0], [0, tj], 1 < i < c, (where the number of zeroes in makes up a total of n elements 
for the row) generates the set of labels of all paths from any start node to any final node. This set has 2 l+2c 
elements. This can be verified from the product construction. The set of elements of this vector space consists 
of semicodewords and codewords. Each semicodeword is the label of an (sj, fj) path i ^ j. ■ 
Example 5.1: The matrix G s corrresponding to the matrix Gkv for the Hamming (7,4) code of Example 13.1 
is displayed below. 



1 1 1 
10 111 



G s 



1 
1 1 
1110 
1 

It can be observed that the semicodeword 1100110 formed by adding rows 1 and 3 of G s traces a path from 
start vertex S2 to final vertex fi in the tail-biting trellis of Figure ^ 

Lemma 5.9: The algorithm will not close any node whose metric exceeds the cost of the ML path. 

Proof: The lemma follows from lines 6 and 7 of function Expand and the observation that calling 
function Expand on a node is equivalent to closing the node. The test ensures that only nodes with metric 
value less than the current metric are closed. Since the current metric is a lower bound on the cost of the ML 
path the lemma follows. ■ 

We use a result of Tendolkar and Hartmann [32] stated below. 



Lemma 5.10: Let H be the parity check matrix of the code and let a codeword x be transmitted as a 
signal vector 5(x). Let the binary quantization of the received vector r = r\, T2, . . . r n be denoted by y. Let 
r' = (|ri|, |t~2 I , ■ • ■ |tVi|) and S = yH T . Then ML decoding is achieved by decoding a received vector r into 
the codeword y + e where e is a binary vector that satisfies s = eH T and has the property that if e' is any 
other binary vector such that s = e'H T then e.r' < e'.r' where . is the inner product. 
A direct consequence of Lemma 15.101 is the following result. 

Lemma 5.11: If the all-zero codeword is the ML codeword for an error pattern e then 

e.r' < (c + ej.r' (5) 

for any non-zero codeword c. 

Since the space explored by the algorithm, namely the space of semicodewords and codewords is a vector 
space, we can analyse the algorithm assuming that the ML codeword is the all codeword. 

Lemma 5.12: Assume the all codeword is the ML codeword. Let e be the binary quantization of the 
received vector. For the error pattern e the second phase of the decoding algorithm will close the start nodes 
of only those subtrellises whose initial metric corresponds to a semicodeword C s satisfying 

(c s +e)y < e y (6) 

Proof: We first note that at the start of the second phase the metrics at the start nodes of all residual 
subtrellises correspond to the costs of vectors in the vector space of codewords and semicodewords, i.e. the 
vector space defined by the generator matrix G s . From Lemma l5~8l we have (C s +e)Hf = e.Hj where H s is 
the parity check matrix corresponding to the matrix G s . From Lemma 15 . 1 01 maximum likelihood decoding on 
the set of semicodewords will initially choose C s , a semicodeword, which satisfies the inequality of the Lemma 
and the algorithm will close the start node of the sub trellis with that initial metric. As the algorithm proceeds 
with updating metrics it may close start nodes of other subtrellises. However by Lemma IS~9l it will never close 
the start node of any trellis Tj whose initial metric exceeds that of the ML codeword, which implies that the 
all-0 codeword is more likely than the semicodeword survivor in Tj, thus implying Equation [6] ■ 
The properties of the algorithm proved in this section will be used to explain the good performance of the 
approximate algorithms described in the following section. 

VI. An Approximate Algorithm 

Recall that each shared node is treated as a distinct node in the second phase of the algorithm. We now 
propose an approximate variant of the exact algorithm which closes a shared node at most once in the second 
phase. We term this algorithm Approxl. 

Assume we replace line 5 of function Expand by 
if succ. state S then Update(trellisnumber, state, succ. state, succ.metric, index) else continue 

What this ensures is that each shared node is closed at most once, that is, by at most one subtrellis, in the 
second phase. Therefore the total number of Viterbi updates in the first phase and expansions in the second 
phases is at most 2V where V is the number of states in the tail-biting trellis. Since a node is closed by at 



most one subtrellis, it is conceivable that a shared node that is on the ML path is closed by a subtrellis that 
does not contain the ML codeword. In such a case the result produced will not be the ML codeword. We now 
analyse the conditions under which this happens. The symbols are the same as those defined for Lemma 15.121 
The following theorem gives the conditions under which the approximate algorithm produces a non-ML 
output. Recall that the intersection property requires that the intersection of all the zero runs of vectors in G' c 
be non-empty. 

Theorem 6.1: If the tail-biting trellis satisfies the intersection property, the approximate algorithm produces 
a non-ML output for error patterns e satisfying equation [6] whenever C s is a semicodeword which is formed 
as a linear combination of rows of G s that contain at least one non-zero multiple of a vector from G;. 

Proof: Let us assume that the all-zero codeword is the ML codeword but that it is not the output of the 
approximate algorithm Approxl. Therefore some trellis say Tj must close a node N on the all path (so that 
To never gets to close it, as only one closure is allowed, and therefore cannot output the all path). Clearly 
node N must be in the merging interval of To and T. Since T is a residual trellis(otherwise it would have 
not participated in the second phase), let the survivor at /; in the first phase be an (sfc, /,) path that overtakes 
the (sj, N, fi) path at node A, in other words, A is the critical node for trellis Tj. 

Case 1. Suppose node A is a predecessor of node N. By Lemma 15.51 nii(A) = mj(/j), and since A is a 
critical node, by Lemma 15.41 T would have gone on to win in the exact algorithm and therefore the all-zero 
codeword could not have been the ML codeword giving a contradiction. 

Case 2. Suppose node A is a successor of N within the merging interval of T and Tq. By Lemma IB31 
rrii(A) = m,- (/,•). Since is the ML codeword rrii(fi) > mo(fo) implying that rrii(A) > mo(/o). Since 
subtrellis T closed node N, by Lemma l5~6l nii(N) < mo(N). By the property of the metric itiq(N) < 
mo(/o) implying that rrii{N) < mo(/o). Since A is in the merging interval of T) and T by Lemma l5"71 
nii(A) < mo (A) < mo(/o) giving a contradiction. Therefore we conclude that if subtrellis T closes TV and 
A is a successor of N, then A cannot be in the merging interval of T and Tq. 

We thus conclude that A is beyond the merging interval of To and T, and hence the (s/s, A, fi) path does not 
touch the all-zero path. Since the intersection property is satisfied, any path which is a linear combination of 
vectors of G' c alone must have at least one node on the all-zero path. Hence the semicodeword corresponding 
to the (sfc, A, fi) path cannot be formed as a linear combination of rows only in G' c and therefore it is formed 
as a linear combination of vectors with at least one member of G; . ■ 

Theorem 16.11 and Lemmas 15.1 II and 15.121 provide an explanation of the experimental observation that 
decoding differences between the exact and the approximate algorithm are infrequent, so much so, that the bit 
error rate curves are practically indistinguishable. Lemma l5.12l tells us that in order for a subtrellis to be opened 
it must contain a semicodeword satisfying equation |6] being the most likely semicodeword among the possible 
candidates. Theorem 16. II establishes the condition that if a node on the all-zero path is closed by some trellis 
T other than To when the all-zero codeword was transmitted, then the initial metric of T must be that of a 
semicodeword of pretty high weight (because it is a linear combination of vectors which contain at least one 
vector in Gi). Further, the error e which caused the cost of this high weight semicodeword to drop significantly 
enough to satisfy Equation [6] should not cause the weight of any non-zero codeword to drop by an amount 



enough to violate Equation |3 Since semi-codewords share prefixes and suffixes with codewords, such events 
may be quite infrequent. 

One could get an even better approximation by allowing a node to be closed at most twice. We have 
experimented with this and observe that the bit error rate for this approximation is indistinguishable from that 
of the exact algorithm at all values of signal to noise ratio for all the three codes on which we have run the 
simulations. The significance of this is that the time complexity can be explicitly bounded by the complexity 
of at most three computations for each node of the tail-biting trellis, one update in the first Viterbi decoding 
phase and at most two expansions in the second phase. 

A. Complexity Analysis 

We now estimate the time complexity of the approximate algorithm. The following bound on the complexity 
of the Viterbi algorithm is well known[21]. 

Lemma 6.1: The complexity of the first phase of the decoding algorithm is 0(E) where E is the number 
of edges in the tail-biting trellis. 

The next lemma is a statement of a well known result on heap data structures [2]. 

Lemma 6.2: Each insertion into the heap has complexity O(logiJ) where H is the number of elements in 
the heap. 

Theorem 6.2: The algorithm Approxl has complexity bounded by 0(E log V) where V is the number of 
states in the tail-biting trellis. 

Proof: The number of vertices that are updated is at most 2V as each vertex is expanded at most once 
in the second phase. Each time a vertex is expanded it results in computations on every edge leaving it and 
at most a constant number of elements being visited and inserted into the heap »S,(as this number is bounded 
by the field size assumed to be a constant). The complexity of each insertion phase is log H where H is the 
size of the heap. Since this size is proportional to V the complexity of the second phase is 0(E log V). The 
sorting operation at the end of the first phase has complexity 0(Vq log Vo) where Vo is the number of states 
at time index 0. The complexity is dominated by the 0(E log V) term and hence the theorem. ■ 

To reduce the overheads, the heap is implemented as m separate heaps if there are m residual trellises, 
with a separate heap of pointers, each element of which points to the root of a distinct subtrellis heap. The 
individual heap sizes are small in practice and the algorithm is practically linear in the size of the trellis. In 
the next section we present results from profiling the program which bear out the claim that the overheads of 
heap operations are negligible. 

An argument similar to that in Theorem 16 . 21 estalishes the complexity of algorithm Approxl as 0(E log V). 
We next look at the space complexity of the algorithm. 

Lemma 6.3: The space requirement for algorithm Approxl is 0(Vo x V) bits. 

Proof: The algorithm requires 0(V) space to store the estimates at each state in the first phase. The 
additional space required to store the heap is also 0(V) as each expanded node can put at most all its successors 
on the heap. The bit vectors that store trellis membership are of size Vq where Vq is the number of start nodes of 



the tail-biting trellis. The space requirements for the bit vectors is therfore VqxV bits. The space requirements 
for storing the current cost at each node is 0(V). This follows from the fact that each shared node is closed 
at most once. This means that at most one copy of a shared node updates its succesors. This in turn means 
that each successor has at most one update along each of its incoming edges. Since the number of incoming 
edges is a constant which is at most the size of the field, a constant number of costs are associated with each 
node in the tail-biting trellis from which the result follows. ■ 

VII. Simulations 

We have coded the exact and approximate algorithms and show the results of simulations on minimal tail- 
biting trellises for the 16 state tail -biting trellis [6] for the extended (24,12) Golay code on an AWGN channel 
with antipodal signaling, and tail-biting trellises for two rate 1/2 convolutional codes with memory 6, circle 
size 48 (which is the same as the (554,744) convolutional code experimented with in [5], and memory 4, circle 
size 20 (which is the same as the (72,62) convolutional code used in [4] respectively. We show the variation of 
both, the average as well as the maximum number of node computations (counting Viterbi updates in the first 
phase and expansions in the second phase) with the signal to noise ratio for our exact algorithm, and compare 
this with the number of Viterbi updates needed for the brute force approach. Note that this number is indicative 
of the time complexity of the algorithm. The results are encouraging and are displayed in Tables [I] [H] and [H]] 
respectively for the Golay code and the two convolutional codes. On the average, the number of updates to get 
the exact ML result requires fewer than two computations at each node of the tail-biting trellis at all values of 
signal to noise ratio, one in the first pass and one in the second. The maximum number of node computations 
for the algorithm Approxl is obviously bounded by twice the number of nodes in the tail-biting trellis. We 
also display the bit error-rate performance of the approximate algorithms closing nodes at most once for the 
first approximation Approxl, and at most twice for the second approximation, Approxl in Figures [6] 00 
and and find that there is virtually no difference in the bit error rates for the second approximation and the 
exact ML algorithm. Thus we get virtually ML performance for an explicit linearly bounded update complexity 
at all values of signal to noise ratio. 

VIII. Discussion and Conclusions 

We have proposed an exact algorithm for ML decoding on tail-biting trellises and also experimented on 
two approximate variants. The average time complexity of the exact algorithm is seen to be quite low. The 
approximate variants perform as well as the exact one in terms of the bit error rate at an explicitly bounded 
update complexity equivalent to two, or sometimes three rounds on the tail-biting trellis. The algorithm does 
not suffer from the effects of limit cycles or pseudocodewords which current iterative algorithms are subject 
to. Profiling measurements carried out on the program are displayed in Table IVIIII The execution time was 
averaged over 10,000 runs of the decoder. The percentage of execution time taken up by each of the five 
major operations in the decoding process, namely, the initializations of all the arrays, the first pass, the sorting 
operation at the end of the first pass, the second pass, and the heap operations is displayed. It can be observed 
that heap operations incur an overhead of only 1 1 % of the program running time at dB and are negligible 
for higher values of signal to noise ratios. 
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Fig. 6. BER for the Exact and Approximate Algorithms for the (24,12) Extended Binary Golay Code 
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TABLE I 

Runtime statistics for the Exact algorithm for the (24, 14) Extended Binary Golay Code. A brute force 

ALGORITHM WOULD TYPICALLY PERFROM 1744 UPDATES. THE TAIL-BITING TRELLIS HAS 192 STATES. 



The results of simulations on the extended (24,12) Golay code, a rate 1/2, memory 6 convolutional code 
with a circle size of 48(which is the same as the (554,744) convolutional code used for experiments in [5] 
and a rate 1/2 memory 4 convolutional code with a circle size of 20(which is the same as the (72,62) rate 
1/2 convolutional used for experimentation in [4]) have been reported. It is seen that the second approximate 
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Fig. 7. Bit Error Rates for the Exact and Approximate Algorithms for the rate 1/2 (133,171) Convolutional Code with circle length 48 
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Fig. 8. Bit Error Rates for the Exact and Approximate Algorithms for the rate 1/2 (35,31) Convolutional Code with circle length 20 
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TABLE II 

Runtime statistics for the Exact algorithm for the rate 1 /2 [133, 171] convolutional code with circle length 48. 
A brute force algorithm would typically perform 159552 updates. The tail-biting trellis has 3072 states. 
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TABLE III 

Runtime statistics for the Exact algorithm for the rate 1 /2 [35, 31] convolutional code with circle length 20. A 
brute force algorithm would typically perform 4368 UPDATES. The tail-biting trellis HAS 320 STATES. 



variant has a bit error rate which is indistinguishable from that of the exact algorithm for all values of signal 
to noise ratio. 
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